Conventionally, a configuration has been proposed where a sound that is generated in the surrounding area is picked up and processed, the picked-up sound and the processed sound are mixed together, and the mixed sound is output from a loudspeaker, thereby causing the listener to hear a sound which is different from the sound that is generated in the surrounding area (for example, see Patent Document 1). According to the configuration, the sound (for example, the voice of the speaker) that is generated in the surrounding area is made difficult to be heard, and it is possible to mask the voice of the speaker.